Introduction

In this webpage, we will be exploring a mortality and poverty dataset and attempting to render a plotly graph through it. In order to render a graph, we will first need to install the plotly library

library(tidyverse)
library(dplyr)
library(plotly)

Once that has been installed, we now move on to data loading

Data Loading

This part is simple. We just import the data files, join them together, and clean them up to remove non-countries (haha, good trick Elias!)

# read in the data
mortality <- read_csv("mortality.csv")
poverty <- read_csv("poverty.csv")

# explore the data
head(mortality)
## # A tibble: 6 x 59
##   country `1960` `1961` `1962` `1963` `1964` `1965` `1966` `1967` `1968` `1969`
##   <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Afghan~   356.   350.   345.   339.   334.   328.    323   318.   312.   307.
## 2 Albania    NA     NA     NA     NA     NA     NA      NA    NA     NA     NA 
## 3 Algeria   242.   242.   243.   244.   245    246.    247   247.   246.   244 
## 4 Andorra    NA     NA     NA     NA     NA     NA      NA    NA     NA     NA 
## 5 Angola     NA     NA     NA     NA     NA     NA      NA    NA     NA     NA 
## 6 Antigu~    NA     NA     NA     NA     NA     NA      NA    NA     NA     NA 
## # ... with 48 more variables: `1970` <dbl>, `1971` <dbl>, `1972` <dbl>,
## #   `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>,
## #   `1978` <dbl>, `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>,
## #   `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>,
## #   `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>,
## #   `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>,
## #   `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>,
## #   `2003` <dbl>, `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
## #   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
## #   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>, `2017` <dbl>
head(poverty)
## # A tibble: 6 x 35
##   country `1996` `2002` `2005` `2008` `2012` `2000` `1986` `1987` `1991` `1992`
##   <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1 Albania  12.4   16.6    9.79   6.11   6.79   NA       NA  NA     NA     NA   
## 2 Angola   NA     NA     NA     54.5   NA      54.2     NA  NA     NA     NA   
## 3 Argent~   8.98  25.4   11.4    6.79   3.69   12.1      0   2.22   3.91   4.48
## 4 Armenia  40.4   49.4   24.7   12.9   17.4    NA       NA  NA     NA     NA   
## 5 Azerba~  NA      0.24   0      2.51  NA      NA       NA  NA     NA     NA   
## 6 Bangla~  NA     NA     63.0   NA     NA      70.1     NA  NA     82.4   NA   
## # ... with 24 more variables: `1993` <dbl>, `1994` <dbl>, `1995` <dbl>,
## #   `1997` <dbl>, `1998` <dbl>, `1999` <dbl>, `2001` <dbl>, `2003` <dbl>,
## #   `2004` <dbl>, `2006` <dbl>, `2007` <dbl>, `2009` <dbl>, `2010` <dbl>,
## #   `2011` <dbl>, `2013` <dbl>, `2014` <dbl>, `1983` <dbl>, `1985` <dbl>,
## #   `1988` <dbl>, `1990` <dbl>, `1981` <dbl>, `1982` <dbl>, `1984` <dbl>,
## #   `1989` <dbl>
# create tidy datasets
mortality_tidy <- mortality %>% 
  pivot_longer(cols = !country, names_to = "year", values_to = "mrate")

poverty_tidy <- poverty %>%
  pivot_longer(cols = !country, names_to = "year", values_to = "prate")

# joining datasets together
measurements <- inner_join(mortality_tidy, poverty_tidy, by = c("country", "year")) %>%
  na.omit() %>% filter(!country %in% c("Europe & Central Asia", "East Asia & Pacific", 
                                       "Middle East & North Africa", "Sub-Saharan Africa", "Latin America & Caribbean", 
                                       "Low income", "Low & middle income",
                                       "Lower middle income", "Lower middle income", "Middle income", "Upper middle income", 
                                       "Fragile and conflict affected situations", "IDA total", "IDA only", "IDA blend", 
                                       "IDA & IBRD total", "IBRD only"))

Creating a regression model

With the data, let’s quickly create our regression model

model <- lm(mrate ~ prate, measurements)

summary(model)
## 
## Call:
## lm(formula = mrate ~ prate, data = measurements)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -79.878 -12.445  -1.050   7.557 180.451 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.32228    1.21747   5.193 2.46e-07 ***
## prate        1.33676    0.03063  43.643  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.27 on 1113 degrees of freedom
## Multiple R-squared:  0.6312, Adjusted R-squared:  0.6308 
## F-statistic:  1905 on 1 and 1113 DF,  p-value: < 2.2e-16

Including Plots

Now that we have the data and the model, all that’s left to do is to plot the figure (fingers crossed, let’s hope this works!!)

## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter

And that’s how it works folks!